Converting Treebank Annotations to Language Neutral Syntax

نویسندگان

  • Richard Campbell
  • Eric K. Ringger
چکیده

We describe the automatic conversion of English Penn Treebank (PTB) annotations into Language Neutral Syntax (LNS) (Campbell and Suzuki, 2002a,b). In this paper, we describe LNS and why it is useful, describe the conversion algorithm, present an evaluation of the conversion, and discuss some uses of the converted annotations and the potential for extending the coverage to other languages. The work described here is in the spirit of other automatic re-annotations of PTB trees (e.g. Frank, 2000 and Meyers, 2001), but differs in the nature of the output.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank

The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging ...

متن کامل

A methodology for designing semantic annotations

This paper presents a methodology for designing languages for semantic annotation. Central in this methodology is the specification of representation formats as renderings of conceptual structures defined by an abstract syntax as set-theoretic constructs. An ideal representation format is defined as one that is able to represent all the conceptual distinctions made in the abstract syntax, and o...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Discourse-Level Annotation For Investigating Information Structure

We present discourse-level annotation of newspaper texts in German and English, as part of an ongoing project aimed at investigating information structure from a cross-linguistic perspective. Rather than annotating some specific notion of information structure, we propose a theory-neutral annotation of basic features at the levels of syntax, prosody and discourse, using treebank data as a start...

متن کامل

Utilizing Linguistic Resources

The Prague Dependency Treebank (henceforth PDT) is a large collection of texts in Czech. It contains several layers of rich annotation, ranging from morphology to deep syntax. It is unique in its size and theoretical background, especially for a language like Czech, which can be, with regard to the number of its speakers, considered a small language. In this article, we use PDT 2.0 to demonstra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004